Visualizing Spatial Data

This script will serve as a reference for plotting methods (traditional and trellis). As usual, set up the environment...

In [2]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

#Set print width
pd.set_option('line_width',140)

#Enable R capability
%load_ext rmagic 

#Set working directory (cloned from Github)
workdir='/home/choct155/u46e_linux/choct155/b7be6eab-3c3c-4b68-8148-1321c1e344ae/home/choct155/analysis/asdar/asdar/'

First, we need to pull in our dataset that we will be using for this exercise.

In [3]:
%%R
library(sp)
data(meuse)

#Examin data
print(str(meuse))
print(summary(meuse))
'data.frame':	155 obs. of  14 variables:
 $ x      : num  181072 181025 181165 181298 181307 ...
 $ y      : num  333611 333558 333537 333484 333330 ...
 $ cadmium: num  11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
 $ copper : num  85 81 68 81 48 61 31 29 37 24 ...
 $ lead   : num  299 277 199 116 117 137 132 150 133 80 ...
 $ zinc   : num  1022 1141 640 257 269 ...
 $ elev   : num  7.91 6.98 7.8 7.66 7.48 ...
 $ dist   : num  0.00136 0.01222 0.10303 0.19009 0.27709 ...
 $ om     : num  13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
 $ ffreq  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ soil   : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 1 1 2 ...
 $ lime   : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
 $ landuse: Factor w/ 15 levels "Aa","Ab","Ag",..: 4 4 4 11 4 11 4 2 2 15 ...
 $ dist.m : num  50 30 150 270 380 470 240 120 240 420 ...
NULL
       x                y             cadmium           copper      
 Min.   :178605   Min.   :329714   Min.   : 0.200   Min.   : 14.00  
 1st Qu.:179371   1st Qu.:330762   1st Qu.: 0.800   1st Qu.: 23.00  
 Median :179991   Median :331633   Median : 2.100   Median : 31.00  
 Mean   :180005   Mean   :331635   Mean   : 3.246   Mean   : 40.32  
 3rd Qu.:180630   3rd Qu.:332463   3rd Qu.: 3.850   3rd Qu.: 49.50  
 Max.   :181390   Max.   :333611   Max.   :18.100   Max.   :128.00  
                                                                    
      lead            zinc             elev             dist        
 Min.   : 37.0   Min.   : 113.0   Min.   : 5.180   Min.   :0.00000  
 1st Qu.: 72.5   1st Qu.: 198.0   1st Qu.: 7.546   1st Qu.:0.07569  
 Median :123.0   Median : 326.0   Median : 8.180   Median :0.21184  
 Mean   :153.4   Mean   : 469.7   Mean   : 8.165   Mean   :0.24002  
 3rd Qu.:207.0   3rd Qu.: 674.5   3rd Qu.: 8.955   3rd Qu.:0.36407  
 Max.   :654.0   Max.   :1839.0   Max.   :10.520   Max.   :0.88039  
                                                                    
       om         ffreq  soil   lime       landuse       dist.m      
 Min.   : 1.000   1:84   1:97   0:111   W      :50   Min.   :  10.0  
 1st Qu.: 5.300   2:48   2:46   1: 44   Ah     :39   1st Qu.:  80.0  
 Median : 6.900   3:23   3:12           Am     :22   Median : 270.0  
 Mean   : 7.478                         Fw     :10   Mean   : 290.3  
 3rd Qu.: 9.000                         Ab     : 8   3rd Qu.: 450.0  
 Max.   :17.000                         (Other):25   Max.   :1000.0  
 NA's   :2                              NA's   : 1                   

The first plot will be simple points. Note that we must first establish which variables to use as coordinates.

In [4]:
%%R
#Set coordinates
coordinates(meuse)<-c('x','y')

#Plot points
print(plot(meuse))

#Set title
title('points')
NULL

Note that by establishing a coords slot for meuse (via the coordinates() call), the input data.frame has automatically been converted to a SpatialPointsDataFrame object. It must be remembered that trying to reassign the same coordinates to meuse after this conversion will throw an exception. If the cell must be re-run, the input data.frame must be read in again.

To build a SpatialLines object, the coordinates must be connected in sequence.

In [5]:
%%R
cc<-coordinates(meuse)
print(cc[1:10,])
           x      y
 [1,] 181072 333611
 [2,] 181025 333558
 [3,] 181165 333537
 [4,] 181298 333484
 [5,] 181307 333330
 [6,] 181390 333260
 [7,] 181165 333370
 [8,] 181027 333363
 [9,] 181060 333231
[10,] 181232 333168

Recall the class structure. The lines slot in a SpatialLines object is a list of Lines objects, which are themselves lists of Line objects. We must take each hierarchical level into account in building the SpatialLines object from a coordinate array. Additionally, each Lines object must be given a valid ID (hence the '1').

In [6]:
%%R
#Build SL object
m.sl<-SpatialLines(list(Lines(list(Line(cc)),'1')))

#Plot lines
print(plot(m.sl))

#Set title
title('lines')
NULL

Now to the interesting data model for our purposes, polygons. For this we are looking at a different set (meuse.riv) holding information for the plotting of a river. Again we have to navigate nested classes to get to a SpatialPolygons object, and again we must provide the objects that sit in the primary spatial slot with a valid identifier. In this case, meuse.riv contains a single polygon.

In [7]:
%%R
data(meuse.riv)

#Build list that sits in polygons slot of SpatialPolygons
meuse.lst<-list(Polygons(list(Polygon(meuse.riv)),'meuse.riv'))

#Build SpatialPolygons object
meuse.sr<-SpatialPolygons(meuse.lst)

#Plot polygon
print(plot(meuse.sr,col='blue'))

#Set title
title('polygon')
NULL

Just for the purposes of a composite image, we will run quickly through the grid data construction...

In [8]:
%%R
data(meuse.grid)
coordinates(meuse.grid)<-c('x','y')
meuse.grid<-as(meuse.grid,'SpatialPixels')
image(meuse.grid,col='grey')
title('grid')

Now to build the composite...

In [9]:
%%R
image(meuse.grid, col='tan')
plot(meuse.sr,col='blue',add=TRUE)
plot(meuse,add=TRUE)

We can create panel layouts and customize axes...

In [10]:
%%R
#Generate layout for side by side plots
layout(matrix(c(1,2),1,2))

#Plot each panel
plot(meuse.sr,axes=TRUE)
plot(meuse.sr,axes=FALSE)

#Set the tick intervals and range of the x and y axes
axis(1,at=c(178000+0:2*2000),cex.axis=.7)
axis(2,at=c(326000+0:3*4000),cex.axis=.7)

If we want to show attributes of spatial features, we must use type, size, or color of symbols, lines or polygons to convey information. We will plot the meuse information yet again, even going so far as to use the pixel grid to plot interpolated zinc concentration as a background image.

In [11]:
%%R -w 600 -h 800
library(spatstat)
#print(str(meuse))
#Set interpolation color range for zinc concentration
grays=gray.colors(4,.55,.95)

#Plot zinc concentration
#zn.idw <- idw(log(zinc) ~ 1, meuse, meuse.grid)
#image(zn.idw, col=grays, breaks=log(c(100,200,400,800,1800)))

#Plot river
plot(meuse.riv,add=TRUE)

#Plot points
plot(meuse,pch=1,cex=sqrt(meuse$zinc)/20,add=TRUE)

#Set legend 
legVals<-c(100,200,500,1000,2000)

#Create first legend
legend('left',legend=legVals,pch=1,pt.cex=sqrt(legVals)/20,bty='n',title='measured')

#Create another legend
legend('topleft',legend=c('100-200','200-400','400-800','800-1800'),fill=grays,bty='n',title='interpolated')
Loading required package: mgcv
This is mgcv 1.7-24. For overview type 'help("mgcv-package")'.
Loading required package: deldir
deldir 0.0-22
spatstat 1.31-3 
Type ‘help(spatstat)’ for an overview of spatstat 
     ‘latest.news()’ for news on latest version 
     ‘licence.polygons()’ for licence information on polygon calculations

Moving in a different direction, we can work with areal data as well. Here is an example in which we can actually identify individual polygons...

In [12]:
%%R
library(maptools)

#Set projection 
prj<-CRS("+proj=longlat +datum=NAD27")

#Grab shapefile from maptools
nc_shp<-system.file('shapes/sids.shp',
                    package='maptools')#[1]

print(nc_shp)
#Read shapefile
nc<-readShapePoly(nc_shp,proj4string=prj)

plot(nc)

#Establish coordinate identifier (identify provides labels instead of coordinates)
pt<-locator(type='p')  #Not yet clear on sequence of operations here...

print(pt)
Loading required package: foreign
Loading required package: grid
Loading required package: lattice
Checking rgeos availability: FALSE
 	Note: when rgeos is not available, polygon geometry 	computations in maptools depend on gpclib,
 	which has a restricted licence. It is disabled by default;
 	to enable gpclib, type gpclibPermit()
[1] "/home/choct155/R/x86_64-pc-linux-gnu-library/3.0/maptools/shapes/sids.shp"
NULL

spplot and Lattice Plots

To prepare for the lattice plots we seek here, we will need some front end code not included in the book...

In [13]:
%%R
library(gstat)
data(meuse)

#Set coordinates
coordinates(meuse) <- ~x+y

#Convert to grid object
data(meuse.grid) 
coordinates(meuse.grid) <- ~x+y
gridded(meuse.grid) <- T

#Introduce kriging (beyond scope of this script, but it's how the interpolation is done)
zn <- krige(zinc~1,meuse,meuse.grid)
zn$direct <- zn$var1.pred
zn$log <- exp(krige(log(zinc)~1,meuse,meuse.grid)$var1.pred)

Attaching package: ‘gstat’

The following object is masked from ‘package:spatstat’:

    idw

[inverse distance weighted interpolation]
[inverse distance weighted interpolation]

Now we can get a sense of what lattice provides us with. Note that z is defined by the gridded meuse data (~x+y) from above. We will plot this data with two methods, levelplot and spplot.

With levelplot, it can be seen that the first argument captures the specific data to be mapped (zx+y|name). The vertical bar indicates the desire to have this data (x+y) faceted by a given factor (name). The second argument captures the containing data.frame (spmap.to.lev(zn[c("direct", "log")])). The data are drawn, in this case, from the kriging operation above performed on the gridded meuse data (which resulted in zn). The result was then captured as 'direct', and then a new instance was created that logged the result (captured as 'log'). zn[c("direct", "log")] subsets zn to these two vectors.

Helper function spmap.to.lev converts zn (a SpatialPixelsDataFrame object) to a data.frame that has the grid values for both 'direct' and 'log' in a single column, accompanied by a column of factor labels ('direct' and 'log'). The rest of the arguments set the auxillary characteristics of the plot.

In [14]:
%%R
library(lattice)

print(levelplot(z~x+y|name, spmap.to.lev(zn[c("direct", "log")]), asp = "iso",
        cuts=4, col.regions=grey.colors(5, 0.90, 0.50, 2.2)),
	split = c(1,1,1,2), more = TRUE)
print(levelplot)
function (x, data, ...) 
UseMethod("levelplot")
<environment: namespace:lattice>

spplot, by contrast, wraps much of this functionality automatically. All that is required is the original data.frame (zn[c("direct", "log")]), and a few parameters indicating the number and type of colors, etc.

In [15]:
%%R
print(spplot(zn[c("direct", "log")], cuts=4,
        col.regions=grey.colors(5, 0.90, 0.50, 2.2)), split = c(1,2,1,2))
print(spplot)
function (obj, ...) 
standardGeneric("spplot")
<environment: 0x4157d08>
attr(,"generic")
[1] "spplot"
attr(,"generic")attr(,"package")
[1] "sp"
attr(,"package")
[1] "sp"
attr(,"group")
list()
attr(,"valueClass")
character(0)
attr(,"signature")
[1] "obj"
attr(,"default")
`\001NULL\001`
attr(,"skeleton")
(function (obj, ...) 
stop("invalid call in method dispatch to 'spplot' (no default method)", 
    domain = NA))(obj, ...)
attr(,"class")
[1] "standardGeneric"
attr(,"class")attr(,"package")
[1] "methods"

ggmap

ggmap offers an alternative to sp based plotting that relies on the Grammar of Graphics. The drawback is substantially less flexibility with respect to coordinate reference systems. To borrow Hadley Wickham's own words:

"Since ggplot2 is an implementation of the layered grammar of graphics, every plot made with ggplot2 has each of the above elements. Consequently, ggmap plots also have these elements, but certain elements are fixed to map components : the x aesthetic is fixed to longitude, the y aesthetic is fixed to latitude, and the coordinate system is fixed to the Mercator projection."

The upside to fixing these elements is consistent scaling, which turns out to be a very strong positive when faceting.

There are two major steps involved: >1. Use get_map to download images and format them for plotting; and,

  1. Use ggmap to actually plot.

As far as getting basic spatial information is concern, geocode accepts not only coordinates, but also strings...

In [16]:
%%R
library(ggmap)

print(geocode('the white house'))
Loading required package: ggplot2
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=the+white+house&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
       lon      lat
1 -77.0365 38.89768

Maps are easily brought in centered on a point location, with zoom sitting in the range [3,20]. qmap serves as a convenience mapping function that wraps the finer resolution control of ggmap.

In [17]:
%%R
au<-'american university'
print(qmap(au,zoom=14))
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=american+university&zoom=14&size=%20640x640&scale=%202&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=american+university&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms

Google Maps is the default pull, but other sources such as OpenStreetMap are also available.

In [18]:
%%R
print(qmap(au,zoom=14),source='osm')
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=american+university&zoom=14&size=%20640x640&scale=%202&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=american+university&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms

The appearance of the map, or tile style, is driven by the map source. get_map is actually a wrapper function that covers four different sources: >1. Google Maps

  1. OpenStreetMap
  1. Stamen Maps
  1. CloudMade Maps

The latter two provide the most versatility in tile options. CloudMade, in particular, provides access to thousands of user-created tiles (although you need an API key). There are some pretty cool tile designs in CloudMade (and there is an editor to create your own).

In [19]:
%%R
print(qmap('the white house',zoom = 14,maptype = 53428,api_key='439cf1240ef4468ebb4277d0d31a6570',source = "cloudmade"))
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=the+white+house&zoom=14&size=%20640x640&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=the+white+house&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms

The get_map function actually returns a raster object (like a SpatialPixel object)...

In [20]:
%%R
co<-get_map(location='colorado')
print(str(co))
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=colorado&zoom=10&size=%20640x640&scale=%202&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=colorado&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
 chr [1:1280, 1:1280] "#D8ECD5" "#D8ECD5" "#D1DCC4" "#8BA081" ...
 - attr(*, "class")= chr [1:2] "ggmap" "raster"
 - attr(*, "bb")='data.frame':	1 obs. of  4 variables:
  ..$ ll.lat: num 39.2
  ..$ ll.lon: num -106
  ..$ ur.lat: num 39.9
  ..$ ur.lon: num -105
NULL

Note how the data come in coordinate pairs of character values. Each value is a hexadecimal color address. The object carries to main attribute: >1. class = ['ggmap','raster']

  1. bb (or Bounding Box) has two coordinate pairs, lat/long for the lower left and upper right corners.

ggmap takes this raster object and plots it using the following basic ggplot call of reasonably familiar structure:

In [21]:
"""
ggplot(aes(x=lon, y=lat), data=fourCorners) +
geom_blank() + 
coord_map("mercator") +
annotation_raster(ggmap, xmin, xmax, ymin, ymax)
"""
Out[21]:
'\nggplot(aes(x=lon, y=lat), data=fourCorners) +\ngeom_blank() + \ncoord_map("mercator") +\nannotation_raster(ggmap, xmin, xmax, ymin, ymax)\n'

fourCorners here is the data.frame created via the expand.grid function being called on the single row data.frame of bounding box parameters.

In [22]:
%%R
print('***BOUNDING BOX***')
print(attr(co,'bb'))
print(cat('\n'))
print('***EXPAND.GRID(BOUNDING BOX)***')
print(expand.grid(lon=c(attr(co,'bb')$ll.lon,attr(co,'bb')$ur.lon),lat=c(attr(co,'bb')$ll.lat,attr(co,'bb')$ur.lat)))
[1] "***BOUNDING BOX***"
    ll.lat    ll.lon   ur.lat    ur.lon
1 39.20984 -106.2208 39.88754 -105.3419

NULL
[1] "***EXPAND.GRID(BOUNDING BOX)***"
        lon      lat
1 -106.2208 39.20984
2 -105.3419 39.20984
3 -106.2208 39.88754
4 -105.3419 39.88754

Additional arguments include the following: >1. extent - Captures how much of the graphics device is covered by the map. Options are ['normal','device','panel']. 'device' is default and fills the entire graphics device without axes. 'panel' fills the device with axes. 'normal' leaves axis padding.

  1. base_layer - The default base layer is ggplot(aes(x=lon, y=lat), data=fourCorners). This argument allows one to substitute a new base layer, which is generally required for faceting because facet_wrap and facet_grid work of said layer. (Setting maprange=TRUE ensures that the map itself determines the x and y axis limits.)
  1. legend - Determines legend location ['left','right','bottom','top','topleft','bottomleft','topright','bottomright','none']
  1. padding - Determines distance of legend from corner when on top.
  1. darken - tints the map image on the [0,1] interval.

Note that ggmap returns a ggplot object, which can be used as a base layer. This enable smalle tweaks from additional parameters in subsequent plots.

We will now get into an example provided by Hadley to demonstrate the functionality. It examines pre-geocoded violent crime in my hometown of Houston, TX.

In [23]:
%%R
#Examine crime data (taken from the ggmap library)
str(crime)
'data.frame':	86314 obs. of  17 variables:
 $ time    : POSIXt, format: "2010-01-01 01:00:00" "2010-01-01 01:00:00" ...
 $ date    : chr  "1/1/2010" "1/1/2010" "1/1/2010" "1/1/2010" ...
 $ hour    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ premise : chr  "18A" "13R" "20R" "20R" ...
 $ offense : Factor w/ 7 levels "aggravated assault",..: 4 6 1 1 1 3 3 3 3 3 ...
 $ beat    : chr  "15E30" "13D10" "16E20" "2A30" ...
 $ block   : chr  "9600-9699" "4700-4799" "5000-5099" "1000-1099" ...
 $ street  : chr  "marlive" "telephone" "wickview" "ashland" ...
 $ type    : chr  "ln" "rd" "ln" "st" ...
 $ suffix  : chr  "-" "-" "-" "-" ...
 $ number  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ month   : Ord.factor w/ 8 levels "january"<"february"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ day     : Ord.factor w/ 7 levels "monday"<"tuesday"<..: 5 5 5 5 5 5 5 5 5 5 ...
 $ location: chr  "apartment parking lot" "road / street / sidewalk" "residence / house" "residence / house" ...
 $ address : chr  "9650 marlive ln" "4750 telephone rd" "5050 wickview ln" "1050 ashland st" ...
 $ lon     : num  -95.4 -95.3 -95.5 -95.4 -95.4 ...
 $ lat     : num  29.7 29.7 29.6 29.8 29.7 ...

We need a reasonable extent, and we can find this via trial and error by zooming. The critical feature, however, is using gglocator to grab the bounding box coordinates of our desired extent. It works much like the locator functions used with sp, and is applicable for any ggplot2 object. The kicker, however, is that it's only effective in interactive sessions because it requires manual intervention. gglocator defaults to one click on the map (like locator), and passing an argument of 2 permits two clicks. One click yields a point, two yields a rectangle which can be used as the bounding box.

In [25]:
%%R
#Hadley already figured out a desirable extent
print(qmap('houston',zoom=13))

#Use gglocator to grab bounding box data
#gglocator(2)

#Look at gglocator function
print(gglocator)
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
function (n = 1, message = FALSE, xexpand = c(0.05, 0), yexpand = c(0.05, 
    0)) 
{
    if (n > 1) {
        df <- NULL
        for (k in 1:n) {
            df <- rbind(df, gglocator(message = message, xexpand = xexpand, 
                yexpand = yexpand))
        }
        return(df)
    }
    x <- grid.ls(print = message)$name
    x <- x[grep("panel.", x)][1]
    seekViewport(x)
    loc <- as.numeric(grid.locator("npc"))
    object <- last_plot()
    xrng <- with(object, range(data[, deparse(mapping$x)]))
    yrng <- with(object, range(data[, deparse(mapping$y)]))
    xrng <- scales::expand_range(range = xrng, mul = xexpand[1], 
        add = xexpand[2])
    yrng <- scales::expand_range(range = yrng, mul = yexpand[1], 
        add = yexpand[2])
    point <- data.frame(xrng[1] + loc[1] * diff(xrng), yrng[1] + 
        loc[2] * diff(yrng))
    names(point) <- with(object, c(deparse(mapping$x), deparse(mapping$y)))
    point
}
<environment: namespace:ggmap>

Now to massage the data a bit, we need only violent crimes in the desired area. It will also be useful to provide an ordinal ranking of severity. In order of increasing severity: >1. robbery

  1. aggravated assault
  1. rape
  1. murder
In [26]:
%%R
#Subset to violent crimes
violent_crimes <- subset(crime,
                         offense!='auto theft' & offense!='theft' & offense!= 'burglary')

#Restrict to only the downtown area (the bounding box would have been determined by gglocator in Rstudio rather than Notebook)
violent_crimes <- subset(violent_crimes,
                         -95.39681<=lon & lon<=-95.34188 & 29.73631<=lat & lat<=29.78400)

#Provide ordinal ranking for crimes
violent_crimes$offense<-factor(violent_crimes$offense,
                               levels=c('robbery','aggravated assault','rape','murder'))

Now we can actually map the violent crime data using one of the many themes at ggmap's disposal.

In [27]:
%%R
#Set theme
theme_set(theme_bw(16))

#Generate base map
HoustonMap <- qmap("houston", zoom = 14, color = "bw", legend = "topleft")

#Incorporate data
print(HoustonMap + geom_point(aes(x = lon, y = lat,colour = offense, size = offense),data = violent_crimes))
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=14&size=%20640x640&scale=%202&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms

Not bad looking, but it is point data, so Hadley uses this as an opportunity to deal with overplotting via binning. One simple way of doing it is to ignore frequency and just create a map of sectors demonstrating where crime is happening. This method also obscures an understanding of multiple types of violent crime in a given areas. It does, however, have the virtue of being easier to see.

More importantly, this plot demonstrates that we can reformulate the data to provide a different view within the plotting script. This is the advantage that ggplot brings over matplotlib.

In [29]:
%%R
print(HoustonMap + stat_bin2d(aes(x=lon,y=lat,colour=offense,fill=offense), 
                        size=.5,bins=30,alpha=.5,data=violent_crimes))

As noted before, we ignored frequency in creating the binned image. A good way to focus on frequency would be to aggregate all violent crimes and examine the gradient of incidence.

In [41]:
%%R
#Create a new map of Houston
houston<-get_map('houston',zoom=14)

#Create new base map
HoustonMap<-ggmap(houston, extent='device', legend='topleft')

#Generate density plot
print(HoustonMap + stat_density2d(aes(x=lon,y=lat,fill=..level..,alpha=..level..),
                            size=2,bins=5',data=violent_crimes,geom='polygon'))
#print(violent_crimes$..level..)
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=14&size=%20640x640&scale=%202&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms

The '..level..' parameter seems to capture the aggregate count. It does this by capturing the height of the density value. If we need to clarify the distinction between the map and the color gradient, an inset can be used.

Finally, we get to faceting. What if we want to see the crime hotspots by day of the week? Below we split the data across days and change the color scheme.

In [44]:
%%R -w 800 -h 1000

#Generate new map of Houston
houston<-get_map(location='houston',zoom=14,color='bw',source='osm')

#Generate new base map
HoustonMap<-ggmap(houston,base_layer=ggplot(aes(x=lon,y=lat),data=violent_crimes))

#Generate plot similar to above + faceting
print(HoustonMap + stat_density2d(aes(x=lon,y=lat,fill=..level..,alpha=..level..),
                            bins=5,geom='polygon',data=violent_crimes) +
            scale_fill_gradient(low='black',high='red') + 
            facet_wrap(~day))
Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=houston&zoom=14&size=%20640x640&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=houston&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms

The contour 'clipping' (particularly visible on Sunday) is a known issue to be fixed in future versions of ggplot2. However, it does not obscure the hotspot identification.

Geocoding

ggmap has a utility function called geocode which seeks to cut out the middle man in translating features and addresses to mappable locations. There are three options ['latlon','more','all'] that provide increasingly detailed information from the JSON tree retrieved from Google's Geocoding API.

In [53]:
%%R
#Simple output
print(geocode('1101 4th St. SW, Washington DC 20024',output='latlon'))
cat('\n')

#More detail
print(geocode('1101 4th St. SW, Washington DC 20024',output='more'))
cat('\n')

#Entire JSON tree (commented out because it's really long)
#print(geocode('1101 4th St. SW, Washington DC 20002',output='all'))
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=1101+4th+St.+SW,+Washington+DC+20024&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
        lon      lat
1 -77.01799 38.87769

Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=1101+4th+St.+SW,+Washington+DC+20024&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
        lon      lat           type loctype
1 -77.01799 38.87769 street_address rooftop
                                               address    north    south
1 1101 4th street southwest, washington, dc 20024, usa 38.87904 38.87634
       east      west postal_code       country administrative_area_level_2
1 -77.01664 -77.01934       20024 united states        district of columbia
  administrative_area_level_1   locality               street streetNo
1        district of columbia washington 4th street southwest     1101
  point_of_interest                                query
1              <NA> 1101 4th St. SW, Washington DC 20024


revgeocode goes the other way, converting lon/lat into physical addresses...

In [54]:
%%R
ocfo<-geocode('1101 4th St. Southwest, Washington DC 20024')
ocfo<-as.numeric(ocfo)

print(revgeocode(ocfo))
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=1101+4th+St.+Southwest,+Washington+DC+20024&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8776872,-77.0179904&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
[1] "1101 4th Street Southwest, Washington, DC 20024, USA"

Distance Computation

Google also has a Distance Matrix API that provides the travel distance and time between two locations for driving, bicycling, and walking modes of transportation. Initially, I was concerned that this was not quite as useful as a straight line distance between polygon centroids for my purposes, but I have now considered that economic resources must use some mode of transportation. Perhaps it's straight line distances that are more unrealistic...

In [56]:
%%R
#Enter origins and destinations
from<-c('1101 4th St. Southwest, Washington DC 20024','the white house','22 Bryant St. NE, Washington DC 20002')
to<-c('the white house','22 Bryant St. NE, Washington DC 20002','1101 4th St. Southwest, Washington DC 20024')

#Query distances
print(mapdist(from,to))
Information from URL : http://maps.googleapis.com/maps/api/distancematrix/json?origins=1101+4th+St.+Southwest+Washington+DC+20024&destinations=the+white+house&mode=driving&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/distancematrix/json?origins=22+Bryant+St.+NE+Washington+DC+20002&destinations=1101+4th+St.+Southwest+Washington+DC+20024&mode=driving&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/distancematrix/json?origins=the+white+house&destinations=22+Bryant+St.+NE+Washington+DC+20002&mode=driving&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
                                         from
1 1101 4th St. Southwest, Washington DC 20024
2                             the white house
3       22 Bryant St. NE, Washington DC 20002
                                           to    m    km    miles seconds
1                             the white house 3551 3.551 2.206591     553
2       22 Bryant St. NE, Washington DC 20002 5077 5.077 3.154848     842
3 1101 4th St. Southwest, Washington DC 20024 6248 6.248 3.882507     631
    minutes     hours
1  9.216667 0.1536111
2 14.033333 0.2338889
3 10.516667 0.1752778

As can be seen, the output is a convenient data.frame. The default mode is driving, and as with geocoding, varying levels of detail are available via an output argument.

Working with Shapefiles

ggmap has simplified the process of working with shapefiles, and eliminated the need to use a commercial GIS system to do so. The basic method involves first reading in a shapefile as a SpatialPolygonsDataFrame object, and then converting it to a data.frame via a function called fortify. Creating areal maps is as easy as merging data with the 'fortified' data.frame. Additional layers can be added by appending more geom layers.

In [72]:
%%R
library(maptools)
library(gpclib)
library(sp)

#Note that we are reading in this data with sp into a SpatialPolygonsDataFrame
DCshp<-readShapeSpatial('/home/choct155/Google Drive/Dissertation/Data/spatial/DC/tl_2012_11_tract.shp',proj4string=CRS('+proj=longlat +datum=WGS84'))
#print(DCshp)

#Convert to r data.frame
dc<-fortify(DCshp)
print(head(dc))
cat('\n')

#Create base map
dcmap<-qmap('washington dc',zoom=11,maptype='toner',source='stamen')
print(dcmap)

#Combine census tract data with base map
print(dcmap + geom_polygon(aes(x=long,y=lat,group=group),
                     data=dc,colour='white',fill='red',alpha=.5,size=.2))
Regions defined for each Polygons
       long      lat order  hole piece group id
1 -77.00528 38.90936     1 FALSE     1   0.1  0
2 -77.00514 38.90985     2 FALSE     1   0.1  0
3 -77.00496 38.91048     3 FALSE     1   0.1  0
4 -77.00494 38.91058     4 FALSE     1   0.1  0
5 -77.00479 38.91112     5 FALSE     1   0.1  0
6 -77.00458 38.91186     6 FALSE     1   0.1  0

Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=washington+dc&zoom=11&size=%20640x640&maptype=terrain&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=washington+dc&sensor=false
Google Maps API Terms of Service : http://developers.google.com/maps/terms